486 research outputs found
Dual-attention Focused Module for Weakly Supervised Object Localization
The research on recognizing the most discriminative regions provides
referential information for weakly supervised object localization with only
image-level annotations. However, the most discriminative regions usually
conceal the other parts of the object, thereby impeding entire object
recognition and localization. To tackle this problem, the Dual-attention
Focused Module (DFM) is proposed to enhance object localization performance.
Specifically, we present a dual attention module for information fusion,
consisting of a position branch and a channel one. In each branch, the input
feature map is deduced into an enhancement map and a mask map, thereby
highlighting the most discriminative parts or hiding them. For the position
mask map, we introduce a focused matrix to enhance it, which utilizes the
principle that the pixels of an object are continuous. Between these two
branches, the enhancement map is integrated with the mask map, aiming at
partially compensating the lost information and diversifies the features. With
the dual-attention module and focused matrix, the entire object region could be
precisely recognized with implicit information. We demonstrate outperforming
results of DFM in experiments. In particular, DFM achieves state-of-the-art
performance in localization accuracy in ILSVRC 2016 and CUB-200-2011.Comment: 8 pages, 6 figures and 4 table
Coded Residual Transform for Generalizable Deep Metric Learning
A fundamental challenge in deep metric learning is the generalization
capability of the feature embedding network model since the embedding network
learned on training classes need to be evaluated on new test classes. To
address this challenge, in this paper, we introduce a new method called coded
residual transform (CRT) for deep metric learning to significantly improve its
generalization capability. Specifically, we learn a set of diversified
prototype features, project the feature map onto each prototype, and then
encode its features using their projection residuals weighted by their
correlation coefficients with each prototype. The proposed CRT method has the
following two unique characteristics. First, it represents and encodes the
feature map from a set of complimentary perspectives based on projections onto
diversified prototypes. Second, unlike existing transformer-based feature
representation approaches which encode the original values of features based on
global correlation analysis, the proposed coded residual transform encodes the
relative differences between the original features and their projected
prototypes. Embedding space density and spectral decay analysis show that this
multi-perspective projection onto diversified prototypes and coded residual
representation are able to achieve significantly improved generalization
capability in metric learning. Finally, to further enhance the generalization
performance, we propose to enforce the consistency on their feature similarity
matrices between coded residual transforms with different sizes of projection
prototypes and embedding dimensions. Our extensive experimental results and
ablation studies demonstrate that the proposed CRT method outperform the
state-of-the-art deep metric learning methods by large margins and improving
upon the current best method by up to 4.28% on the CUB dataset.Comment: Accepted by NeurIPS 202
- …